Search CORE

INRIA a CCSD electronic archive server

Maximum likelihood models and algorithms for gene tree evolution with duplications and losses

Author: AP Martin
B Ma
Gordon J Burleigh
J Ruan
JA Cotton
JA Cotton
JB Slowinski
JH Degnan
JP Demuth
JP Doyon
JP Doyon
JS Taylor
L Arvestad
L Arvestad
L Arvestad
L Liu
L Zhang
M Goodman
M Lynch
MA Bender
MJ Sanderson
MR Garey
MR McGowen
O Akerborg
Oliver Eulenstein
P Górecki
P Górecki
P Górecki
Pawel Górecki
R Redon
RD Page
RDM Page
S Ohno
SB Hedges
W Maddison
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The abundance of new genomic data provides the opportunity to map the location of gene duplication and loss events on a species phylogeny. The first methods for mapping gene duplications and losses were based on a parsimony criterion, finding the mapping that minimizes the number of duplication and loss events. Probabilistic modeling of gene duplication and loss is relatively new and has largely focused on birth-death processes. Results We introduce a new maximum likelihood model that estimates the speciation and gene duplication and loss events in a gene tree within a species tree with branch lengths. We also provide an, in practice, efficient algorithm that computes optimal evolutionary scenarios for this model. We implemented the algorithm in the program DrML and verified its performance with empirical and simulated data. Conclusions In test data sets, DrML finds optimal gene duplication and loss scenarios within minutes, even when the gene trees contain sequences from several hundred species. In many cases, these optimal scenarios differ from the lca-mapping that results from a parsimony gene tree reconciliation. Thus, DrML provides a new, practical statistical framework on which to study gene duplication.</p

Fast computation of distance estimators

Author: A Rambaut
D Swofford
F Barker
H Kishino
I Elias
Isaac Elias
J Felsenstein
J Felsenstein
Jens Lagergren
K Tamura
K Tuplin
L Arvestad
M Kimura
N Saitou
T Jukes
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Some distance methods are among the most commonly used methods for reconstructing phylogenetic trees from sequence data. The input to a distance method is a distance matrix, containing estimated pairwise distances between all pairs of taxa. Distance methods themselves are often fast, e.g., the famous and popular Neighbor Joining (NJ) algorithm reconstructs a phylogeny of n taxa in time O(n(3)). Unfortunately, the fastest practical algorithms known for Computing the distance matrix, from n sequences of length l, takes time proportional to l·n(2). Since the sequence length typically is much larger than the number of taxa, the distance estimation is the bottleneck in phylogeny reconstruction. This bottleneck is especially apparent in reconstruction of large phylogenies or in applications where many trees have to be reconstructed, e.g., bootstrapping and genome wide applications. RESULTS: We give an advanced algorithm for Computing the number of mutational events between DNA sequences which is significantly faster than both Phylip and Paup. Moreover, we give a new method for estimating pairwise distances between sequences which contain ambiguity Symbols. This new method is shown to be more accurate as well as faster than earlier methods. CONCLUSION: Our novel algorithm for Computing distance estimators provides a valuable tool in phylogeny reconstruction. Since the running time of our distance estimation algorithm is comparable to that of most distance methods, the previous bottleneck is removed. All distance methods, such as NJ, require a distance matrix as input and, hence, our novel algorithm significantly improves the overall running time of all distance methods. In particular, we show for real world biological applications how the running time of phylogeny reconstruction using NJ is improved from a matter of hours to a matter of seconds

MAP20, a Microtubule-Associated Protein in the Secondary Cell Walls of Hybrid Aspen, Is a Target of the Cellulose Synthesis Inhibitor 2,6-Dichlorobenzonitrile

Author: Alex S. Rajangam
Björn Sundberg
Christian J.-L. Brown
Christina Divne
Ewa Mellerowicz
Gea Guerriero
Henrik Aspeborg
Ines Ezcurra
Kristina Blomqvist
Lars Arvestad
Manoj Kumar
Podjamas Pansri
Sophia Hober
Tuula T. Teeri
Vincent Bulone
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date
Field of study

PhyloPattern: regular expressions to identify complex patterns in phylogenetic trees

Author: A Bateman
A Levasseur
AK Wright
BE Engelhardt
CA Paulding
CM Zmasek
D Barker
D Durand
DH Huson
DHD Warren
J Felsenstein
J McCarthy
J Ruan
JD Thompson
JF Dufayard
JS Farris
Julie D Thompson
L Arvestad
N Krishnamurthy
O Sakarya
P Gouret
Philippe Gouret
Pierre Pontarotti
RG Beiko
T Blomme
T Dobzhansky
TJ Hubbard
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background To effectively apply evolutionary concepts in genome-scale studies, large numbers of phylogenetic trees have to be automatically analysed, at a level approaching human expertise. Complex architectures must be recognized within the trees, so that associated information can be extracted. Results Here, we present a new software library, PhyloPattern, for automating tree manipulations and analysis. PhyloPattern includes three main modules, which address essential tasks in high-throughput phylogenetic tree analysis: node annotation, pattern matching, and tree comparison. PhyloPattern thus allows the programmer to focus on: i) the use of predefined or user defined annotation functions to perform immediate or deferred evaluation of node properties, ii) the search for user-defined patterns in large phylogenetic trees, iii) the pairwise comparison of trees by dynamically generating patterns from one tree and applying them to the other. Conclusion PhyloPattern greatly simplifies and accelerates the work of the computer scientist in the evolutionary biology field. The library has been used to automatically identify phylogenetic evidence for domain shuffling or gene loss events in the evolutionary histories of protein sequences. However any workflow that relies on phylogenetic tree analysis, could be automated with PhyloPattern.</p

HAL AMU

HAL-Inserm

Gene-pseudogene evolution: a probabilistic approach

Author: A Heger
AC Marques
AN Khachane
Bengt Sennblad
C Esnault
C Jacq
C Schiff
D Benovoy
D Zheng
EM Muro
EP Nawrocki
ES Balakirev
G Bejerano
HS Rothenfluh
J Felsenstein
J Sjöstrand
Jens Lagergren
JP Bielawski
K Nowick
K Nowick
Katja Nowick
L Arvestad
L Arvestad
L Li
Lars Arvestad
M Goodman
N Saitou
O Mahmudi
O Podlaha
O Svensson
Owais Mahmudi
PM Harrison
PM Harrison
R Sudbrak
S Huntley
SB Hedges
SF Altschul
TA Gray
V Ranwez
Y Niimura
Y Niimura
Z Zhang
ZD Zhang
Ö Åkerborg
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Refining transcriptional regulatory networks using network evolutionary models and gene histories

Author: A Bhan
A Crombach
A Stark
A Tanay
AL Barabási
Bernard ME Moret
BME Moret
C Roth
CT Harbison
D Durand
DM Hillis
G Bourque
J Kim
J Yu
KP Murphy
L Arvestad
M Kanehisa
MM Babu
MM Babu
N Friedman
N Friedman
R Wang
RDM Page
S Liang
SA Teichmann
SY Kim
T Akutsu
T Chen
T Pupko
X Zhang
X Zhang
Xiuwei Zhang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Computational inference of transcriptional regulatory networks remains a challenging problem, in part due to the lack of strong network models. In this paper we present evolutionary approaches to improve the inference of regulatory networks for a family of organisms by developing an evolutionary model for these networks and taking advantage of established phylogenetic relationships among these organisms. In previous work, we used a simple evolutionary model and provided extensive simulation results showing that phylogenetic information, combined with such a model, could be used to gain significant improvements on the performance of current inference algorithms. Results In this paper, we extend the evolutionary model so as to take into account gene duplications and losses, which are viewed as major drivers in the evolution of regulatory networks. We show how to adapt our evolutionary approach to this new model and provide detailed simulation results, which show significant improvement on the reference network inference algorithms. Different evolutionary histories for gene duplications and losses are studied, showing that our adapted approach is feasible under a broad range of conditions. We also provide results on biological data (<it>cis</it>-regulatory modules for 12 species of <it>Drosophila</it>), confirming our simulation results.</p

Serveur académique lausannois

RecPhyloXML: a format for reconciled gene trees.

Author: Arigon Chifolleau A.M.
Arvestad L.
Bansal M.S.
Berry V.
Boussau B.
Chevenet F.
Comte N.
Daubin V.
Davín A.A.
Dessimoz C.
Duchemin W.
Dylus D.
Gence G.
Hasic D.
Mallo D.
Planel R.
Posada D.
Scornavacca C.
Szöllosi G.
Tannier É.
Zhang L.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 14/05/2018
Field of study

A reconciliation is an annotation of the nodes of a gene tree with evolutionary events-for example, speciation, gene duplication, transfer, loss, etc.-along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative-albeit flexible-specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities. http://phylariane.univ-lyon1.fr/recphyloxml/

INRIA a CCSD electronic archive server

VMCMC: a graphical and statistical analysis tool for Markov chain Monte Carlo traces

Author: A Gelman
AJ Drummond
C Lakner
CB Albertin
DF Robinson
DM Hillis
F Ronquist
H Safavi-Hemami
HJ Thiébaux
J Sjöstrand
JA Nylander
JB Kruskal
JB Kruskal
Joel Sjöstrand
Jorge Miró
JP Huelsenbeck
L Gong
Lars Arvestad
M Plummer
MA Suchard
Mikael Bark
O Åkerborg
Raja H. Ali
Raja M. Abbas
RH Ali
S-i Eyun
Sayyed A. Muhammad
SB Hedges
SB Hedges
Syed M. Zubair
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Public Library of Science (PLOS)

MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons

Author: A Löytynoja
B Chevreux
B Morgenstern
C Notredame
CNS Pedersen
D Huchon
D Przybylski
D Sankoff
D Zheng
DG Higgins
E Dermitzakis
Emmanuel J. P. Douzery
F Abascal
F Delsuc
Frédéric Delsuc
H Philippe
H Zhao
J Hein
J Kececioglu
J Kececioglu
J Raes
JD Thompson
K Katoh
KM Wong
L Arvestad
L Salmela
M Dayhoff
M Gouy
M Kircher
M Margulies
M Suyama
MT Gilbert
N Galtier
OR Bininda-Emonds
P Sneath
PJ Farabaugh
R Wernersson
RC Edgar
RC Edgar
RK Bradley
RR Stocsits
RW Meredith
S Henikoff
S Needleman
SF Altschul
SF Altschul
SS Steiger
Sébastien Harispe
T Smith
TA Demere
TJ Hubbard
TJ Wheeler
V Ranwez
Vincent Ranwez
William J. Murphy
X Guan
X Huang
Y Van de Peer
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment